DATASET

The dataset used for illustration is derived from NYC restaurant violation results. It describes the percentage of A, B, C graded restaurants by zip code in a given year. This was derived from individual restaurant data within each zip code. Details for each column are as follows:

Year - Inspection Year

ZipCode - NYC Zip code

Aprop - Proportion of A grade restaurant counts in a zip code for given year

Bprops - Proportion of B grade restaurant counts in a zip code for given year

Cprops - Proportion of C grade restaurant counts in a zip code for given year

CHOROPLETH PLOTS IN TABLEAU

STEP 1: FILTER

Filter the dataset by dragging “year” column from dimensions to filter. Click on the drop down and edit filters.

STEP 2: CONVERT ZIPCODE TO GEOGRAPHIC COLUMN

Add zip code to marks and you should see a map. Tableau automatically detects this as geographical column, but you can verify this by using drop downs as below.

STEP 3: Add the measure of Aprop to Marks.

After you add this as marks, it will appear as a label. For plotting on map, convert this to a dimension and continuous as shown in drop down.

STEP 4: Color code by proportions

Change the type of projection of Aprop on map to color by clicking on the label (just before Aprop as shown).

STEP 5: Change color palette

You can change the color palette by editing colors. If your data is discrete, you can specify the number of steps, and charge the color palette using a simple drop down.

After this, you will have an image map like below.

STEP 6: Show color legend

You can hide the plot hints by unchecking “show me” (top right). This will unhides the labels. You can hover over various zip codes and see the measure (in this example, it’s aprops - the proportion of A grades in the filtered year).

STEP 7: Comparing years

Above we filtered to Year=2013. However if we remove this filter, and instead add year to rows or columns we can view the same proportions across years. Note that this will also convert the show me from map type to gantt type.

CHOROPLETH PLOTS IN R

STEP 1: Load in necessary libraries, and your data

There are several different libraries that allow you to map data in R. choroplethrZip lets you to map based on zip codes, which not all mapping libraries have this capability. Install this library from github. Restart your R after you install from github.

#install_github('arilamstein/choroplethrZip@v1.5.0')
library(choroplethrZip)

Load these libraries too.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(devtools)
library(RColorBrewer)
library(ggplot2)

Load in your data set. In this example we will map the “aprop” values, for the year 2013.

mapdf <- read_csv("zipdataGradePercentage.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   ZIPCODE = col_integer(),
##   year = col_integer(),
##   key = col_character(),
##   aprop = col_double(),
##   bprop = col_double(),
##   cprop = col_double()
## )
mapdf_a_13 <- mapdf %>% filter(year=="2013") %>% select(ZIPCODE, aprop) %>% na.omit

STEP 2: Adjust the column names

choroplethrZip maps based on columns named “region” and “value”, so we must rename the columns in our data frame to do this. Also, the zip codes/region must be expressed as a character.

mapdf_a_13$region <- as.character(mapdf_a_13$ZIPCODE)
mapdf_a_13$value <- mapdf_a_13$aprop

STEP 3: If you run into an error…

You might see an error if not all the zip codes in your data set are in choroplethrZip. Filter out the non mappable ones (you’ll see this if you return an error once you try to map!)

not_mappable <- c(10057, 10104, 10105, 10121, 10176, 10179, 10285, 11249)
mapdf_a_13_edit <- mapdf_a_13 %>% filter(!region %in% not_mappable)

STEP 4: Map with a discrete color scheme

The below graph, the data is mapped using discrete buckets (5 buckets chosen here). choroplethrZip splits buckets evenly in the amount of zip codes. Due to this, the buckets are not even in their breaks, which is misleading when looking at changes in colors. Below, the first bucket ranges from 0 to 0.676, whereas the last bucket only ranges from 0.94 to 1.00. This data set only includes the NYC area, so the map is zoomed in on the zips in the data frame. choroplethrZip’s zoom capabilities include by zip code, state, msa and county.

zip_choropleth(mapdf_a_13_edit, zip_zoom = mapdf_a_13_edit$region, 
               title='Proportion of A Grade Received by a given zip code (2013) \n (discrete scale)', legend="Proportion", num_colors = 5) 

STEP 5: Map with a continuous color scheme

Since the values in this data set are proportions, the below graph uses a continuous scale. Setting “num_colors” to 1 will change the scale to continuous. The color scheme was chosen to mirror the Tableau graph.

zip_choropleth(mapdf_a_13_edit, zip_zoom = mapdf_a_13_edit$region, 
               title='Proportion of A Grade Received by a given zip code (2013) \n (continuous scale)', legend="Proportion", num_colors = 1) +
  scale_fill_continuous(low="#B8DEF3", high="#285685", guide="colorbar", na.value="white")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

COMPARING R AND TABLEAU

  1. Maps in Tableau are drag and drop based. Unlike R, no scripting is required for plotting with Tableau, and it provides a intuitive GUI for plotting.

  2. Tableau automatically handles missing data and issues related to elements that cannot be plotted. For example, it automatically ignored certain zip codes that do not exist (probably data noise). With the choropleth package in R, you need to manually remove zip codes that are in the dataset, but not in the package. While R will prompt you with an error message, this is another step to consider when mapping. It is important to remember that while Tableau automatically removes NA values, this may not be optimal when missing or invalid data points are important information that you want to include in the graph.

  3. Tableau automatically detects intuitive color fill schemes - continuous/discrete based on datatype. While you can set the color steps (number of breaks), there is no option to set unequal (custom) color breaks. As mentioned above, when using the choropleth library in R, you are able to set the number of breaks, but this the splits between breaks will not be equal. This can be very misleading as the color scheme is not perceptually uniform due to the uneven groups.